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Abstract 

Contemporary weather forecasts are typically based on ensemble prediction systems, which con¬ 
sist of multiple runs of numerical weather prediction models that vary with respect to in the initial 
conditions and/or the the parameterization of the atmosphere. Ensemble forecasts are frequently 
biased and show dispersion errors and thus need to be statistically postprocessed. However, current 
postprocessing approaches are often univariate and apply to a single weather quantity at a single 
location and for a single prediction horizon only, thereby failing to account for potentially crucial 
dependence structures. Non-parametric multivariate postprocessing methods based on empirical 
copulas, such as ensemble copula coupling or the Schaake shuffle, can address this shortcoming. A 
specific implementation of the Schaake shuffle, called the SimSchaake approach, is introduced. The 
SimSchaake method aggregates univariately postprocessed ensemble forecasts using dependence 
patterns from past observations. Specifically, the observations are taken from historical dates at 
which the ensemble forecasts resembled the current ensemble prediction with respect to a specific 
similarity criterion. The SimSchaake ensemble outperforms all reference ensembles in an applica¬ 
tion to ensemble forecasts for surface temperature from the European Centre for Medium-Range 
Weather Forecasts. 

Keywords: empirical copula, ensemble copula coupling, probabilistic weather forecasting, Schaake 
shuffle, similarity criterion, statistical ensemble postprocessing 


1 Introduction 


Contemporary weather forecasts are typically constructed from ensemble prediction systems, which 
have run operationally since the early 1990s. An ensemble consists of multiple runs of numerical 
weather prediction models which vary with respect to the initial conditions and/or the parameteriza¬ 
tion of the atmosphere. Consequently, ensembles take account of the two major sources of uncertainty 


(Gneiting and Raftery 

2005 

Leutbecher and Palmer 

2008 

Palmer 

2002 

|. Ensemble forecasts are 

Frequently biased and show dispersion errors ( 

Hamill and Coluccij 199? 

|. Thus, they require statistical 


postprocessing to realize their full capability. During the last decade, several ensemble postprocess¬ 
ing methods have been developed. Examples are (variants of) the ensemble model output statistics 
(EMOS) (jCneiting et al. 2005 among others) approach, which is also known as non-homogeneous 


1 

































regression, or Bayesian model averaging (BMA) (Raftery et al. 2005 among others). However, EMOS 
and BMA, as well as other postprocessing methods, only apply to a single weather variable at a sin¬ 
gle location and for a single prediction horizon. Thus, they fail to account for spatial, temporal or 
inter-variable dependence structures, which are crucial in many applications such as flood warning 


(Schaake et al. 


2010), winter road maintenance (Berrocal et al. 2010) or the handling of renewable 


energy sources (Pinson 2013). In recent years, there has been a keen interest in the development of 
multivariate postprocessing methods being able to address this shortcoming, and a lot of effort has 
been invested to that end. For instance, variants and modifications of EMOS and BMA, respectively, 
that can handle spatial ([Berrocal et al. 2007, 2008: [Feldmann et al. 2015) or inter-variable (Baran 


and Moller 2015; Moller et al., 2013; Schuhen et al. 2012 Sloughter et al. 2013) dependencies are 


available. Furthermore, there are methods to capture temporal dependencies of consecutive lead times 
in postprocessed predictive distributions (Pinson et al. 2009; Schoelzel and Hense, 2011). All these 


multivariate postprocessing methods are parametric and work well in rather low dimensions and in 
settings in which the involved correlation matrix can be taken to be highly structured. However, 
they model one type of dependence (spatial, inter-variable or temporal) only and appear to be inade¬ 
quate when considering high-dimensional situations, in which no particular structure can be exploited. 
These issues can be addressed using non-parametric techniques based on the use of empirical copulas 


Schefzik 

2015b 

). Following 

Wilks 

(2014), an empirical copula ( 

Deheuvels 

1979 

Riischendorf 

2009 


can be interpreted as a dependence template induced by a specific discrete multivariate data set. It can 
be employed to transfer a particular dependence pattern to samples which are drawn independently 
from a collection of univariate marginal distributions (Wilks, 2014). For example, in the ensemble 


copula coupling (ECC) approach of Schefzik et al. (2013), such an empirical copula is derived from the 
unprocessed ensemble forecast and then applied to samples from univariate postprocessed predictive 
distributions, which can be gained via the standard EMOS or BMA approaches. This is equivalent 
to ordering these samples according to the rank dependence structure of the raw ensemble, thereby 
capturing the spatial, temporal and inter-variable flow dependence (Schefzik et al. 2013). Proceed¬ 


ing in a similar manner, the Schaake shuffle (Clark et al., 2004) employs an ordering based on past 


observations from a historical data archive. Consequently, the corresponding empirical copula in the 
Schaake shuffle is induced by an observational database rather than by an ensemble forecast. How¬ 
ever, the standard Schaake shuffle fails to condition the multivariate dependence pattern on current or 


predicted atmospheric states. To address this shortcoming, Clark et al. (2004, page 260) proposed to 
develop an extension thereof, driven by the idea 

“to preferentially select dates from the historical record that resemble forecasted atmo¬ 
spheric conditions and use the spatial correlation structure from this subset of dates to 
reconstruct the spatial variability for a specific forecast.” 


Inspired by this suggestion, a specific implementation of the Schaake shuffle, referred to as the Sim- 
Schaake approach, is introduced in this paper. Essentially, the SimSchaake method proceeds like the 
Schaake shuffle, but the observations determining the dependence structure are taken from historical 
dates at which the ensemble forecasts resembled the current ensemble prediction with respect to a 
specific similarity criterion. 

The remainder of the paper is organized as follows. In Section we first discuss the general setting of 
empirical copula-based ensemble postprocessing and review ECC and the Schaake shuffle as reference 
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examples. Then, we develop the SimSchaake approach. In Section this new method is evaluated 
and compared to the reference methods in a case study. The paper closes with a discussion in Section 

H 


2 Empirical copula-based ensemble postprocessing methods 


Copulas are valuable and established tools for the modeling of stochastic dependence (Joe 2014 


Nelsen 20061. They have been successfully employed in numerous application areas. A copula is an 


L-variate cumulative distribution function (CDF) with standard uniform univariate marginal CDFs, 


where L G N, L > 2. As is manifested in the famous Sklar’s theorem (Sklar 1959), a copula C links a 


multivariate CDF H to its univariate marginal CDFs Fi,..., Fl via the decomposition 

H{ui, ...,ul) = C{Fi{ui),. . .,Fl{ul)) 

for tti,..., Mi G M. In a multivariate postprocessing setting, the sought multivariate CDF H can thus 
be constructed by specifying both the univariate marginal CDFs Fi,..., Fl and the copula C model¬ 
ing the dependence. The CDFs Fi,..., Fl can be obtained by common univariate postprocessing for 
each location, weather variable and look-ahead time individually, for instance performed via EMOS or 
BMA. For the choice of C, a prominent example is the Gaussian copula, which has been applied to 


a wide range of problems in climatology, meteorology and hydrology ( 

Genest and Favre 

2007 

Moller 

et al. 

2013 

Pinson 

2012 

Schoelzel and Friederichs 

2008 

Schuhen et al. 

2012 

1. 


In this paper, we focus on the case in which Fi,..., Fl are the empirical CDFs given by samples 


from univariate postprocessed CDFs and C is taken to be an empirical copula (Deheuvels, 1979 


Riischendorf 2009), Ej^. According to Wilks (2014), an empirical copula can be considered a multi 


variate dependence template derived from a specific discrete data set. To describe this formally, let 
z := {{zl ,..., z^f ),..., {z ^,..., z^)'\ be a data set consisting of L tuples of size N with entries in M. 
Moreover, let rank(z^) denote the rank of in (zf ,..., for n G {1,..., A^} and ^ G {1,..., L}, 
assuming for simplicity that there are no ties. Then, the empirical copula F]\r induced by the data set 
z is given by 


iV’"-’Ar 


N 


1 

2^ ^{rank(zl)<il,...,rank(z^)<iL} 
n=l 


Eh 

n=l 
N L 

En^i 


(1) 


TV ^ '^{>'ank(4)<V} 

n=l£=1 

for integers 0 < 4,... ,4 E N, with 1a denoting the indicator function whose value is 1 if the event 
A occurs, and zero otherwise. 


In this section, we review ensemble copula coupling (ECC) (Schefzik et al. 2013) and the Schaake shuffle 


(Clark et al., 2004) as reference methods within the general frame of empirical copula-based multivari¬ 


ate ensemble postprocessing (Schefzik 2015b). In addition, we develop the SimSchaake method as a 


specific implementation of the Schaake shuffle. While ECC and the Schaake shuffle have been used as 


Vrac and Friederichs 

2015 

Wilks 

2014 


a benchmark in several papers (Schaake et al. 2007 Scheuerer and Hamill 2015 Voisin et al. 2011 


the SimSchaake method is new. 
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To allow for a formal description of the approaches, let us first set some notation which will be used 
throughout the whole section. Let i G {1,...,/} be a weather variable, j G J} a location 

and k G iL} a look-ahead time. For simplicity, let I := (i,j,k) and £* := {i,j) denote the 

corresponding multi-indices, and let L := J x J x iL and L* := I x J, respectively. Moreover, let M 
denote the number of raw ensemble members, and N the desired number of members the postprocessed 
ensemble shall consist of. 

in general, multivariate empirical copula-based ensemble methods to postprocess a raw ensemble fore¬ 
cast X := {(x{,..., x \^),..., (xf,..., x^)} initialized at a specific date to proceed according to the 
following scheme. 

Scheme 1. (Empirical copula-based multivariate ensemble postprocessing) 

1. To implement dependence structures, derive an empirical copula E]\f from a suitable data set 
z := {(4, • • •, zlf ),..., (zf,..., via (Q. 

Equivalently, derive the univariate order statistics < • • ■ < for i G {1,... ,L}, inducing 
the permutation 7r£(n) := rank( 2 ;^) for n G {1,... ,N}. If there are ties, let them be resolved at 
random. 


2. For each margin i, perform univariate postprocessing of the raw ensemble forecast xf,... ,x^ 
based on past forecasts and observations as training data, and obtain a postprocessed predictive 
CDF Fi in each case. 

3. Draw a sample x^,...,x^ from each marginal CDF F). This can be for instance and most 
conveniently done by taking equidistant quantiles of of the form 


Xi .— 


1 


N + 1 


. . . , Xjy 


N 


N + l 


( 2 ) 


4. Apply the empirical copula F]\f to the samples from step 3. 

Equivalently, arrange these samples with respect to the rank dependence structure of the chosen 
data set z in step 1. Using the permutation ire, the final postprocessed ensemble xf,... ,x^ for 
each margin I is thus given by 


x\ := X 


Ddi))’ 


^MN)y 


2.1 Reference methods 


Now we review ECC (Schefzik et al. 20131 and the Schaake shuffle (Clark et al. 20041 as empirical 


copula-based postprocessing techniques within the scheme described above. 


2.1.1 Ensemble copula coupling (ECC) 


In the ECC approach (Schefzik et al. 20131, the data set specifying the dependence structure is given 


by the raw ensemble forecast, that is, we have z = x = {(x{,..., x \^),..., (xf,..., x^)} in Scheme]^ 
Consequently, the sample size in step 3 and hence the size of the final postprocessed ensemble in step 
4 of Scheme is restricted to equal that of the unprocessed ensemble, that is, N = M. 
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The ECC procedure operates under a perfect model assumption, implictly assuming that the ensemble 
is capable to represent actual spatial, inter-variable and temporal dependence structures adequately. 
This may or may not be expected, but surely does not hold each and every day. Moreover, ECC only 
applies to ensembles whose members can be considered exchangeable, that is, statistically indistin¬ 
guishable. 


Schefzik et al. (2013) and Schefzik (2015a I show that ECC provides an overarching frame for seemingly 


unrelated approaches scattered in the literature such as those of Pinson 

(2012) 

Roulin and Vannitsem 

(2012 

), 

Flowerdew] ( 

2014 

) or 

Van Schaeybroeck and Vannitsem 

(2014 

, to name just a few. It has 


been used as a reference technique in the recent papers by Wilks (2014), Feldmann et al. (2015) and 


Scheuerer and Hamill (2015) 


2.1.2 The Schaake shuffle 


In contrast to ECC, the data set determining the dependence pattern in the Schaake shuffle (Clark 


et al.[ 20041 is not given by the raw ensemble forecasts, but by past observations 
y := {(?/|,..., y}^), ..., (yf*, • • ■, yj^)} taken from N different dates of a historical archive. That is, 
observations from the same N dates are employed for all locations and weather variables throughout 
the procedure of the Schaake shuffle for a fixed forecast instance. Hence, we have z = y in Scheme 
In particular, N does not need to equal the raw ensemble size M. 


In the original implementation of the Schaake shuffle by Clark et al. (20041, the corresponding N dates, 
from which the observations are taken, are chosen from all years in the historical record, except for 
the year of the forecast of interest, and lie within seven days before and after the verification date t, 
regardless of the year. A more general implementation of the Schaake shuffle may use observations 
from arbitrary (or randomly selected) past dates in the whole historical record. We will refer to this 
procedure as the Random Schaake method in the case study in Section 


The Schaake shuffle has been employed successfully in numerous applications ( 

Robertson et al-l 

2013 

Schaake et al. 

to 

o 

o 

Verkade et al.l 20131 Voisin et al. 

o 

1 

o 

CN 

2011[|Vrac and Friederichsl 2015t|Wilks 

2014 

1. 


2.2 The SimSchaake approach as a similarity-based implementation of the Schaake 
shuffle 

As we have seen, the Schaake shuffle generates a postprocessed ensemble inheriting the rank dependence 
structure from historical observations. However, its standard implementations fail to condition the 
multivariate dependence pattern on current or predicted atmospheric states. Inspired by the quote 
of Clark et al. (2004) mentioned in the introductory Section we address this challenge by linking 


the Schaake shuffle to similarity- or analog-based ensemble methods. In such approaches, one seeks 
ensemble forecasts in an archive of past data that are similar to the current one. The basic idea is that 
the realizing states of the atmosphere corresponding to such an analog ensemble can be assumed to 


be similar to the stat 
popular and importan 

e to be predicted (Hamill and Whitaker 

2006 

. These techniques have become 

t, as for instance witnessed by the papers 

of Bannayan and Hoogenboom ( 

2008), 

Klausner et al. (2009 

, Hall et al. (2010), Delle Monache et al. 

(2011 

), Messner and Mayr (201 

L) and 

Delle Monache et al. ( 

2013), and further by recent research (Alessandrini et al. 2015 Junk et al. 

2015 


Vanvyve et al. 2015). In this context, the question of the choice of appropriate similarity criteria in a 
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nearest neighbor sense arises, with the papers above providing some proposals. 

The following approach, which will be referred to as the SimSchaake method in what follows, combines 
the idea of searching for similar ensembles and the Schaake shuffle. Like the Schaake shuffle and in 
contrast to standard ECC, it can be applied to any ensemble, regardless of whether it consists of 
exchangeable or non-exchangeable members, and the size N of the final postprocessed ensemble is not 
restricted to equal the raw ensemble size M. To describe the new SimSchaake approach formally in 
detail, let A be the length of the training period required for the univariate ensemble postprocessing, 
to the initialization date of the ensemble forecast, and t the verification date. Let further D be the 
number of dates in the past of to for which ensemble forecast and observation data is available. For 
the feasibility of the SimSchaake approach, it is required that ensemble forecast and observation data 
is available for at least max{A^, A} dates in the past of to, that is, D > max{A^, A}. Considering 
the prediction horizon to be fixed, the SimSchaake approach then proceeds according to the empirical 
copula-based postprocessing as in Scheme where the data set z in step 1 is derived as follows. 

i. For a fixed margin C, let := ..., denote the (possibly standardized) M-member 

raw ensemble forecast valid on date r. Let further 


:= 


,x 


.'T'l — 


= ((a; 


1,T 
1 ) ' 


,X 


1,T 

M 




2,r 


) 


UCi 


L*,r 


)) 


denote the corresponding {L* x M)-tuple consisting of the M-member ensemble forecasts of all 
L* margins, that is, combinations of weather variable and location. 

If weather variables with distinct units or magnitudes are involved, the components of x^ ’’’ 
should be standardized for each multi-index i*. 


ii. For each date td in the past of the initialization date to, where d G {1,..., D}, compute a suitable 
fixed similarity criterion 

^td . ■^L*xM ^ ■g^L*xM [0,oo), (x^x*'*) !-)• A*'^(x*,x*‘'), 

between the actual forecast x* valid on the verification date t and the forecast x*'* valid on the 
date td- The similarity criterion is taken to be negatively oriented, that is, the lower the 
similarity criterion the more similar the ensemble forecasts. A similarity criterion value of exactly 
zero indicates that the ensemble forecasts are identical. 


iii. Choose those N dates ti, ... ,r 7 v G {ti,... for which the data is most similar to that for 
the date t in the sense that the corresponding values of A”^" for n G {1,..., A} are the smallest 
among the values of A*'* for d G {1,..., D}. 

Note that the information of all multi-indices i* simultaneously is employed to determine the 
dates Ti,..., Tat. 

iv. For each margin £*, let ’’’i,... denote the corresponding N historical verifying obser¬ 
vations valid on the dates ri,..., tat determined in step iii. For simplicity, write := for 

n G {1,..., A} and build the data vector := (yf ,..., y^) for the verification date t. The 
data set z in step 1 from Scheme is then obtained by aggregating the historical observation 
databases of all margins £*, that is, z = (y^,..., y^ ). From this template z, the empirical copula 
related to the SimSchaake approach is derived via Q. 
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Figure 1: Scheme of the SimSchaake approach 


Having obtained the data set z and the respective empirical copula according to the above procedure, 
steps 2 to 4 from Scheme are performed in order to generate the hnal postprocessed SimSchaake 
ensemble. A scheme of the SimSchaake approach is given in Fig. 

An appropriate choice of the similarity criterion A*'* in step ii, which is then consistently used through¬ 
out the whole SimSchaake approach, is crucial. We here consider 




X*, : = 






,t _ o£*,tdi2 




£* = 1 


e*=i 


(3) 


where 


-i* T 

X ’ 


M 

-T 

M ^ 


and 


£* T 

s ’ := 


m=l 


\ 


M 

M-l ^ 

m=l 




denote the empirical mean and standard deviation, respectively, of the ensemble forecast x^ ’’’ for the 
hxed multi-index at date r. As it does not depend on how the ensemble members are labeled, 
the similarity criterion in (|^ can be applied to ensembles consisting of exchangeable members. In 
particular, it is suitable for the temperature predictions from the European Centre for Medium-Range 
Weather Forecasts (ECMWF) used in the case study in the next section. While the use of the empirical 
mean to some degree accounts for seasonal aspects when considering temperature forecasts only, the 
empirical standard deviations reflect the uncertainties within the ensemble forecasts. Thus, these issues 
are addressed by (|^ when comparing two ensemble forecasts. Alternative proposals for similarity 
criteria, also for the case of ensembles with non-exchangeable members, can be found in the references 
mentioned before. 

It is possible to transform the values of a similarity criterion A*'* from [0, oo) to (0,1] by employing 
the standardization A**^ := exp(—A**^). With respect to A*'*, similarity values near to 1 indicate a very 
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high similarity between x* and , while similarity values near to 0 point at no similarity. Accordingly, 
if using A*'*, we then have to choose the dates n,..., ttv corresponding to the highest, and not to the 
lowest, values of in step iii. 

As mentioned, the SimSchaake approach addresses two shortcomings in the standard ECC method. 
First, it can also be applied to ensembles consisting of non-exchangeable members, as the reordering 
is not based on the ensemble forecasts, but on observations. Second, with the SimSchaake technique 
we can principally create ensembles of arbitrary size, as long as there are sufficiently many historical 
observations in the past. In contrast to ECC, the postprocessed ensemble is thus not restricted to have 
the same number of members as the raw ensemble. 

In Fig. 1^ an illustration of the three empirical copula-based postprocessing methods presented in this 
section - ECC in the first row, the Schaake shufffe in the second row and the SimSchaake method in 
the third row - is given. We consider 24 hour ahead surface temperature forecasts (in degrees Celsius; 
°C) at Vienna (Austria) and Bratislava (Slovakia) valid on the hot summer day of 9 July 2011 at 1200 
UTC. In the subfigures, the ensemble forecast is indicated by the dots, the verifying observation by 
the cross, and past observations from a historical database by the tetragons. As the ECMWF raw 
ensemble used here has size M = 50, the illustrations are for convenience based on N = M = 50- 
member ensembles for both ECC and the Schaake shufHe-based methods. The first column of Fig. 
shows the corresponding database that is used to determine the dependence structure and the empirical 
copula, respectively, of the postprocessing approach: the ECMWF raw ensemble in the case of ECC, 
past observations on random dates for the Random Schaake method, and past observations on specihc 
dates selected according to the similarity criterion in ([^ in the case of the SimSchaake approach. 
The second column shows three times the same individually postprocessed ensemble forecast. This 
is generated by randomly pairing the equidistant quantiles ([^ from the predictive CDFs obtained by 
univariate EMOS postprocessing (Gneiting et al. 20051 at Vienna and Bratislava separately. Such 


an ensemble forecast does not take account of the pronounced positive spatial correlation structure. 
In the third column, the final postprocessed ensemble forecasts are shown, obtained by applying the 
corresponding empirical copula to the samples derived from the individual EMOS postprocessing. 
These empirical copula-based postprocessed ensembles have the same marginal distributions as the 
individual EMOS postprocessed ensemble, as witnessed by the respective histograms at the top and 
to the right of each subhgure. Additionally, they exhibit the same spatial correlation pattern as the 
underlying database specified in the first column, thus respecting dependencies. 

3 Case study 

3.1 Setting 

In our case study, we employ predictions of the European Centre for Medium-Range Weather Forecasts 
(ECMWF) core ensemble ( ]ECMWF Directorate 20121, whose M = 50 members can be considered 
exchangeable. We focus on 24 hour-ahead temperature forecasts jointly at Vienna (Austria), Bratislava 
(Slovakia) and Budapest (Hungary), and consequently on spatial dependencies only. For each location, 
the ground truth is given by the corresponding surface synoptic observations (SYNOP). The approxi¬ 
mate distance from Vienna to Bratislava is 50 kilometers, from Bratislava to Budapest 170 kilometers, 
and from Vienna to Budapest 210 kilometers. There are pronounced positive pairwise correlations be- 
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(a) Database for Empirical Cop¬ 
ula (Dependence Model) 


ECMWF Raw Ensemble 



29 30 31 32 33 34 


(b) Individually EMOS Postpro- 
cessed Ensemble 



28 29 30 31 32 33 34 35 

Vienna 


(c) EMOS and Empirical 
Copula-Based Postprocessed 
Ensemble 


EMOS-ECC Ensemble 



28 29 30 31 32 33 34 35 



Figure 2: Illustration of ECC (first row), the Random Schaake method (second row) and the Sim- 
Schaake method (third row) using 24 hour ahead temperature forecasts (in °C) at Vienna and Bratislava 
valid on 9 July 2011, 1200 UTC. Ensemble forecasts are indicated by the dots, and the verifying ob¬ 
servation by the cross. The tetragons symbolize past observations from a historical database. At the 
top and to the right of each subfigure, the corresponding marginal histograms are shown. 


9 













tween the observations at the three different locations, which are the stronger the closer the respective 
stations are. These correlation patterns are basically reffected well in the ensemble forecasts. 

We consider those 3985 test days during the period from 1 January 2003 to 31 December 2013 for which 
all required forecast and observation data is available at all the three stations. In our case study, all 


data is valid on 1200 UTC. Univariate postprocessing is performed via EMOS (Gneiting et al. 20051 


using the R package ensembleMOS (R Core Team, 2014 Yuen et al. |2013| ), employing a rolling window 
consisting of the last A = 50 days before the verification day as training period. For each marginal 
EMOS postprocessed predictive CDF Fg, we follow the quantization (|^ and take the N equidistant 
{n/{N + l))-quantiles, where n = 1,..., A^, as samples, focusing on the cases of A" = M = 50 and 
N = 80, respectively, here. 

For these desired ensemble sizes, we assess and compare the predictive performances of 


• the ECMWF raw ensemble, 

• the Individual EMOS ensemble, which assumes independence, 

• the EMOS-ECC ensemble, which assumes a dependence structure according to that in the raw 
ensemble, 

• the EMOS-Random Schaake ensemble, which assumes a dependence structure according to ran¬ 
domly selected historical observation data, and 

• the EMOS-SimSchaake ensemble, which assumes a dependence structure according to specific 
historical observation data valid on dates for which the ensemble forecast resembled the current 
one with respect to similarity criterion ([^. 

Obviously, results for the raw and the EMOS-ECC ensemble can only be reported for the case of 
N = M = 50, whereas the other ensembles are additionally evaluated for the final ensemble size of 
N = 80. For the two approaches employing the Schaake shuffle notion, the past dates from which 
the corresponding verifying observations are taken, are searched for among all available historical 
data, where ensemble forecast and observation data is available from 1 January 2002 to 31 December 
2013. Hence, the database used for the Schaake shuffle-based methods grows over time. Recall that 
the EMOS-Random Schaake method just randomly selects those past dates, whereas the EMOS- 
SimSchaake approach chooses them based on the ensemble similarity criterion ([^. 


3.2 Evaluation tools 


A probabilistic forecast distribution or an ensemble forecast, respectively, should be as sharp as possible, 
subject to calibration, which refers to statistical consistency between the forecasts and the observation 
(Gneiting et al. 2007). To assess the predictive performances of our different ensembles, several 


verification tools are available (Wilks, 2011). 


In univariate settings, calibration can be checked via the verification rank histogram (Anderson 1996 


Hamill, 2001; Talagrand et al. 1997). As we focus on the evaluation of multivariate quantities in this 


paper, ensemble calibration in our case study is checked via the multivariate (Gneiting et al., 2008) 


band depth and average rank histograms (Thorarinsdottir et al. 2015). When an ensemble forecast 


is calibrated, the multivariate, band depth or average rank, respectively, is uniformly distributed. 
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Calibration can thus be diagnosed by compositing over forecast cases, plotting the corresponding 
multivariate, band depth or average rank histogram, respectively, and checking for deviations from 
uniformity, that is, flatness of the histogram. For an interpretation of the different shapes a rank 


histogram for multivariate quantities can exhibit, see Gneiting et al. (2008) and Thorarinsdottir et al. 


(2015). 


The overall forecast skill can be assessed via proper scoring rules (Gneiting and Raftery 2007), which 


are able to assess calibration and sharpness simultaneously and are taken to be negatively oriented 
here, that is, the lower the score the better the predictive performance. A widely used proper scoring 


rule for univariate quantities is the continuous ranked probability score (CRPS) (Gneiting and Raftery 


2007 Matheson and Winkler 1976) 


In this paper, we employ the energy score (ES) (Gneiting et al. 2008), which is the analog of the CRPS 


for multivariate quantities. For an A^-member ensemble forecast 

xi := (xj,..., xf),..., XAT := (x]v, • ■ •, x%) G 


and an observation 

y := e 

the ES is computed as 


ES 



n=l 


2iV2 


N N 

EEi 

i/=l n=l 


X,y - Xr, 


with II • II denoting the Euclidean norm. As the ES reveals weaknesses in detecting misspecifications 

Scheuerer and Hamill| |2015|), we additionally 


in the correlation structure (Pinson and Tastu 


2013 


consider the variogram score (VS) ([Scheuerer and Hamill 2015) to address this, which is given by 


vs = EE 


we\ ye - y\ 


e=i A=i 


1 

N 


N 

E 


- VT > x„ - X^ 


n=l 


where the Wi\s are (optional) non-negative weights. For the case study in this paper, in which we 
focus on spatial dependencies, we follow the suggestion of Scheuerer and Hamill (2015) and let the 
weights be proportional to the inverse spatial distances between the corresponding locations. That is, 
we choose 


wex := 


1 

dist(£,A) 


S dist{£,A) 

e,x=i ^ ^ 
e^x 


for i ^ \ and w^x '■= 0 for £ = X, with dist(t'. A) denoting the spatial distance between location I and 
location A, where all distances have to be measured in the same unit. For the specific implementation in 
our L = 3-dimensional setting here, we employ the distances between Vienna, Bratislava and Budapest 
as mentioned in the preceding subsection. 

In our case study, average scores over all forecast cases within our specific test period are reported. 
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Table 1: Average energy scores (ES) and variogram scores (VS) for 24 hour ahead temperature forecasts 
at Vienna, Bratislava and Budapest jointly over 3985 test days during the period from 1 January 2003 
to 31 December 2013. The scores for the Individual EMOS and the EMOS-Random Schaake ensembles 
are averaged over 100 runs. 




ES 

VS 

o 

II 

ECMWF Raw Ensemble 

2.241 

0.333 


Individual EMOS Ensemble 

1.976 

0.323 

II 

II 

cn 

O 

EMOS-ECC Ensemble 

1.957 

0.270 


EMOS-Random Schaake Ensemble 

1.998 

0.300 


EMOS-SimSchaake Ensemble 

1.952 

0.265 


Individual EMOS Ensemble 

1.971 

0.327 

II 

00 

o 

EMOS-Random Schaake Ensemble 

1.996 

0.300 


EMOS-SimSchaake Ensemble 

1.947 

0.266 


3.3 Results 


As we focus on the multivariate setting in this paper, we do not explicitly show the results for the 
univariate EMOS postprocessing at the three stations individually here. In a nutshell, the EMOS 
postprocessed ensemble forecasts exhibit a better predictive performance than the unprocessed raw 
ensemble predictions, in that they are better calibrated and have smaller CRPS values. 

In Table the average energy scores (ES) and variogram scores (VS) as overall performance mea¬ 
sures are shown. The results for the Individual EMOS and the EMOS-Random Schaake ensemble are 
averaged over 100 runs for each forecast instance, in order to account for randomizations. Precisely, 
for the Individual EMOS ensemble, the results for 100 different aggregations (that is, assignments of 
the member indices) of the equidistant quantiles obtained by the univariate EMOS postprocessing are 
averaged. In case of the EMOS-Random Schaake ensemble, the average is taken over 100 different se¬ 
lections of the random historical dates that are used to define the observation-based dependence model. 
Calibration is assessed via the multivariate, band depth and average rank histograms, respectively, in 
Figs. [3) |4] and [5] 

Both in the case of V = M = 50 and V = 80, all postprocessing methods outperform the raw ensemble 
in terms of the scores. The raw ensemble is clearly uncalibrated, more precisely underdispersed, as 
witnessed by the U-shaped multivariate and average rank histograms and the skewed band depth rank 


histogram with an overpopulation of the lowest ranks (Gneiting et al. 2008 Thorarinsdottir et al. 


20151. 


Considering the results for the postprocessed ensembles consisting of V = M = 50 members first, 
the Individual EMOS ensemble is uncalibrated, yielding U-shaped rank histograms. In the case of the 
band depth and average rank histograms, respectively, this points at an underestimation of the corre¬ 
lation structure (Thorarinsdottir et al., 20151, which is plausible, as this approach does not account 


for dependencies. With respect to the VS, the Individual EMOS ensemble performs worse than the 
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Figure 3: Multivariate rank histograms for 24 hour ahead temperature forecasts at Vienna, Bratislava 
and Budapest jointly over 3985 test days during the period from 1 January 2003 to 31 December 2013. 


ECMWF Raw Ensemble M=S0 Individual EMOS Ensemble N=M=50 EMOS-ECC Ensemble N=M=50 EMOS-Random Schaake Ensemble N=M=50 



1 11 21 31 41 51 1 11 21 31 41 51 1 11 21 31 41 51 1 11 21 31 41 51 

Band Depth Rank Band Depth Rank Band Depth Rank Band Depth Rank 

EMOS-SimSchaake Ensemble N=M=50 Individual EMOS Ensemble N=80 EMOS-Random Schaake Ensemble N=80 EMOS-SimSchaake Ensemble N=80 



1 11 21 31 41 51 1 11 21 31 41 51 61 71 81 1 11 21 31 41 51 61 71 81 1 11 21 31 41 51 61 71 81 

Band Depth Rank Band Depth Rank Band Depth Rank Band Depth Rank 


Figure 4: Band depth rank histograms for 24 hour ahead temperature forecasts at Vienna, Bratislava 
and Budapest jointly over 3985 test days during the period from 1 January 2003 to 31 December 2013 


13 






ECMWF Raw Ensemble M=SO 


1 11 21 31 41 51 

Average Rank 

EMOS-SimSchaake Ensemble N=M=50 


Individual EMOS Ensemble N=M=50 


1 11 21 31 41 51 

Average Rank 

Individual EMOS Ensemble N=80 


EMOS-ECC Ensemble N=M=50 



EMOS-Random Schaake Ensemble N=M=50 




IrrrrTTrTTTTTTrTMTTTl 


11 21 31 41 51 61 71 81 

Average Rank 



1 11 21 31 41 51 61 71 81 

Average Rank 



Figure 5: Average rank histograms for 24 hour ahead temperature forecasts at Vienna, Bratislava and 
Budapest jointly over 3985 test days during the period from 1 January 2003 to 31 December 2013 


other postprocessed ensembles, which assume specific correlation structures. In terms of the ES, the 
EMOS-ECC and the EMOS-SimSchaake ensembles outperform the Individual EMOS ensemble, while 
the distinctions are less pronounced than for the VS. This may be due to the discrimination inability 


of the ES with respect to correlations between different locations (Pinson and Tastu 2013 Scheuerer 
and Hamill 20151, as hinted at in the preceding subsection. 


Comparing the three empirical copula-based postprocessing methods taking account of dependence 
patterns, the EMOS-ECC ensemble is well calibrated, apart from a slight overpopulation of the lowest 
ranks in all rank histograms. In contrast, the calibration of the EMOS-Random Schaake ensemble is not 
that good, as witnessed by the inverse U-shaped band depth and average rank histograms, indicating 
an overestimation of the correlation structures (Thorarinsdottir et al. 20151. Moreover, the multivari¬ 
ate rank histogram of the EMOS-Random Schaake ensemble is skewed with too many low ranks. The 
EMOS-SimSchaake ensemble is calibrated best, with band depth and average rank histograms being 
close to uniform and an essentially flat multivariate rank histogram with a slight overpopulation of the 
lowest ranks only. The ranking of the three ensembles allowing for dependencies in terms of calibration 
is also reflected in the scores. Both for the ES and the VS, the EMOS-SimSchaake ensemble performs 
best, followed by the EMOS-ECC ensemble and finally the EMOS-Random Schaake ensemble. 

The results and conclusions on calibration described above continue to hold analogously for the 
extended Individual EMOS, EMOS-Random Schaake and EMOS-SimSchaake ensembles comprising 
N = 80 members. Similarly, the ranking of the predictive skill of these three extended postprocessed 
ensembles remains unchanged with respect to the ES and VS, respectively, compared to the case of 
N = M = 50, with the EMOS-SimSchaake ensemble still performing by far best. The ES and VS 
values of the N = 80-member ensembles are very similar to those of their counterparts in the case of 
N = M = 50. Although an extension of the ensemble size is generally useful, there is no pronounced 
need in our case here to increase the ensemble size oi N = M = 50, which appears to be already 
reasonably large. 

In a nutshell, the EMOS-SimSchaake ensemble based on the new method introduced in this paper 
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performs best among all reference ensembles, both with respect to calibration and in terms of scores. 
In particular, the EMOS-SimSchaake ensemble outperforms the EMOS-Random Schaake ensemble. 
Hence, there appears to be a clear benefit of using the specific past dates on which the ensemble 
forecasts resembled the current one to create the historical observation database modeling the depen¬ 
dencies, rather than picking these dates randomly. The EMOS-SimSchaake ensemble also outperforms 
the EMOS-ECC ensemble. 


4 Discussion 


We have discussed and compared empirical copula-based ensemble postprocessing methods that are able 
to account for dependencies. While ECC and the Schaake shuffle have been reviewed in a general frame, 
the SimSchaake scheme has been newly developed in this paper as a multivariate postprocessing tool. 
Essentially, the SimSchaake procedure aggregates samples from univariate postprocessed distributions, 
where the underlying dependence structure and the involved empirical copula, respectively, are derived 
from historical observations at dates in the past which showed a similar ensemble forecast to the 
current one. In our case study, the SimSchaake ensemble has performed best overall and better than 
the ECC ensemble, while having the benefit of a broader applicability, in that it can also be employed 
on ensembles comprising non-exchangeable members and is not restricted to have the same size as the 
raw ensemble. 

The SimSchaake method depends on the design of a suitable similarity criterion, where the choice 
of ^ has proven to be useful, yielding good results. However, the predictive performance of the 
SimSchaake ensemble might be improved by using a more sophisticatedly designed similarity criterion, 
perhaps including a suitable weighting function (or monotone transformations thereof) that accounts 
for seasonal aspects. Moreover, the similarity criterion in ^ is tailored to ensembles consisting of 
exchangeable members such as the ECMWF ensemble in our case study. For ensembles comprising 
non-exchangeable members, other criteria might provide more reasonable and better choices. 

As mentioned, a drawback of the standard ECC postprocessed ensemble employed here is that it is 
constrained to have the same size as the unprocessed ensemble, while the Schaake shuffle and the 
SimSchaake ensembles are not. However, there have been first attempts to design ECC-like ensembles 
having more members than the raw ensemble (Schefzik 2015a; Wilks 20141. Similar to the work in 


Wilks (20141, an issue for future work may be to design and conduct a further case study including 
these approaches to allow for a broader comparison of ECC- and Schaake shuffle-based concepts. 
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